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ADAPTIVE BAYESIAN ESTIMATION USING A GAUSSIAN 
RANDOM FIELD WITH INVERSE GAMMA BANDWIDTH 

By a. W. van der Vaart and J. H. van Zanten^ 

Vrije Universiteit Amsterdam 

We consider nonparametric Bayesian estimation inference using 
a rescaled smooth Gaussian field as a prior for a multidimensional 
function. The rescaling is achieved using a Gamma variable and the 
procedure can be viewed as choosing an inverse Gamma bandwidth. 
The procedure is studied from a frequentist perspective in three sta- 
tistical settings involving replicated observations (density estimation, 
regression and classification). We prove that the resulting posterior 
distribution shrinks to the distribution that generates the data at a 
speed which is minimax-optimal up to a logarithmic factor, whatever 
the regularity level of the data-generating distribution. Thus the hi- 
erachical Bayesian procedure, with a fixed prior, is shown to be fully 
adaptive. 

1. Introduction. The quality of nonparametric estimators of densities 
or regression functions is well known to depend on the regularity of the 
true density or regression function. Given n independent observations on 
a function of d arguments that is only known to be a-smooth, the preci- 
sion of estimation is of the order Initially this was shown us- 
ing estimators that depend explicitly on the regularity level a, but later 
it was shown that the optimal rate can be achieved for all levels of regu- 
larity simultaneously. Estimators that are rate optimal for every regular- 
ity level are called adaptive. Cross validation, thresholding, penalization 
and blocking are typical methods to construct such estimators (see, e.g., 
[1, 2, 6, 10, 11, 12, 13, 19, 33, 34, 35, 37] and [42]). 

Adaptive methods often employ a scale of estimators indexed by a band- 
width parameter and adapt by making a data-dependent choice of the band- 
width. Within a Bayesian context it is natural to put a prior on such a 
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bandwidth parameter and let the bandwidth be chosen through the poste- 
rior distribution. In this paper we discuss a particularly attractive Bayesian 
scheme, and show that this yields estimators that are adaptive up to a log- 
arithmic factor. 

Our scheme employs a fixed prior distribution, constructed by rescaling 
a smooth Gaussian random field. There is some (but not much) freedom 
in the choice of Gaussian field and scaling factor. One possible choice is 
the squared exponential process combined with an inverse Gamma band- 
width. The squared exponential process is the centered Gaussian process 
W = {Wt :t € M"^} with covariance function, for || • || the Euclidean norm on 

(1.1) EWsWt = exp{-\\t- sf). 

The Gaussian field W is well known to have a version with infinitely smooth 
sample paths t Wt- To make it suitable as a prior for a-smooth func- 
tions we rescale the sample paths by an independent random variable A 
distributed as the dth root of a Gamma variable. As a prior distribution for 
a function on the domain [0, 1]'^ we consider the law of the process 

{WAt:te[0,lf}. 

The inverse 1/A of the variable A can be viewed as a bandwidth parameter. 
For large A the prior sample path 1 1— > Wm is obtained by shrinking the long 
sample path 1 1— > Wt indexed by t G [0, A]'^ to the unit cube [0,1]*^. Thus it 
employs "more randomness" and becomes suitable as a prior model for less 
regular functions if A is large. 

The effect of scaling the prior was already noted in [47], who showed 
(for d = 1) that a deterministic scaling by the "usual" bandwidth 1/A = 
7^-i/(2a+i) produces priors that are suitable models for a-regular functions. 
The main contribution of the present paper is to show that a single inverse 
Gamma bandwidth gives a scaling that is suitable for every regularity level 
a simultaneously. Furthermore, we extend the earlier results to multivariate 
functions, and show that the procedure also adapts to a scale of infinitely 
smooth functions, of the type considered in [4, 20, 22, 23] and [32]. The 
proofs of several lemmas have common elements with [47], but the main 
result is proved from first principles. 

Of course, a (rescaled) Gaussian random field is not a suitable model 
for a density or a binary regression function. Following other authors we 
transform it for these applications by exponentiation and renormalization, 
or by application of a link function. These transformations and the statisti- 
cal consequences for these settings are given in Section 2, together with the 
application to the regression model. In Section 3 we state a more abstract 
result on rescaled Gaussian random fields, which gives the common struc- 
ture to the three statistical applications. This abstract result also applies to 
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other statistical settings, not discussed in this paper, and concerns Gaus- 
sian random fields more general than the squared exponential process, and 
bandwidths more general than the inverse Gamma. Proofs are deferred to 
Sections 4 and 5. 

We consider only compactly supported functions as parameters, even 
though the priors in principle are functions on the full Euclidean space. 
Consistency of a posterior on the full space can be expected only if the tails 
of the functions are restricted. If they are not, then one would still expect 
that the posterior restricted to compact subsets contracts at some rate. At 
the moment there seem to exist no results that would yield such a rate (or 
even consistency). 

1.1. Notation. Let C[0, l]'^ and C°[0, l]'^ be the space of all continuous 
functions and the Holder space of a-smooth functions / : [0, l]'^ — > M, re- 
spectively, equipped with the uniform norm || • ||oo (cf. [45], Section 2.7.1). 
Let ^'^'^(R'^) be the space of functions /:R"' — > M with Fourier transform 
/(A) = (27r)-'^/e^(^'*)/(t)dt satisfying / ctH^H'' |/P(A) dA < oo. These func- 
tions are infinitely often differentiable and "increasingly smooth" as 7 or r 
increase; they extend to functions that are analytic on a strip in con- 
taining if r = 1 and to entire functions if r > 1 (see, e.g., [3], Theorem 
8.3.5). 

2. Main results. In this section we present the main results for three 
different statistical settings: i.i.d. density estimation, fixed design regression 
and classification. The proofs of these results are consequences of a theorem 
on rescaled Gaussian processes in Section 3, general posterior convergence 
rate results from [16] and [17] and results mapping the three settings to these 
general results given in [46]. The process W and variable A'^ in this section 
are taken to be the squared exponential Gaussian field and an independent 
random variable with a Gamma distribution. For W and A satisfying the 
more general conditions given in Section 3, the same results are true, except 
for the fact that the powers of the logarithmic factors may be different. 

2.1. Density estimation. After exponentiation and renormalization a ran- 
domly rescaled Gaussian process can be used as a prior model for probability 
densities. Priors of this type were, among others, considered by [29, 30] and 
[31]. Posterior consistency was recently obtained in the paper [43]. 

To describe our adaptation result, consider a sample Xi, . . . ,Xn from a 
continuous, positive density /o on the unit cube [0,1]"^ C W^. As a prior 
distribution 11 on /q we use the distribution of 

pWAt 
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Let n(/ G -{Xi, . . . ,Xn) denote the posterior distribution: the conditional 
distribution of / on the Borel sets in C[0, 1]*^ in the Bayesian setup, where 
the density / is first drawn from the prior (2.1) and given / the variables 
are an i.i.d. sample from /. We say that the posterior contracts at rate 
if, for every sufficiently large constant M, as n ^ oo, 

n(/ : h{f, /o) > Me„|Xi, . . . , X„) ^ 0. 

Here h is the Hellinger distance and the convergence is understood to be in 
probability under the (frequentist) assumption that Xi, . . . , Xn are a random 
sample from /q. 

Theorem 2.1. Let 'wo = logfo. 

• If wq ^ C"[0, l]'^ for some q > 0, then the posterior contracts at rate 

• If wq is the restriction of a function in ^'''''"(M'^), then the posterior con- 
tracts at rate n~^/^(logn)'^+^ if r>2 and n~^/^(logn)''+^+'^/(^'') ifr<2. 

The minimax rate of estimation of a density /q that is bounded away 
from zero and known to belong to the space C"[0, 1]*^ of a-H61der continuous 
functions is n~°'^ (2a+d) ^ rjij^^ ^^^^ assertion of the theorem shows that the 
posterior contracts at the minimax rate times a logarithmic factor. It is rate- 
adaptive in the sense that this is true for any a> 0, even though the prior 
does not depend on a. We conjecture that a logaritmic factor in the rate for 
the present prior is necessary, although the power (4a + (i)/(4a + 2d) may 
not be optimal. As shown in Section 3 this power can be improved by using a 
slightly different prior for A. Other Bayesian schemes (see, e.g., [18, 21] and 
[28]) give adaptation without logarithmic factors, but are more complicated. 

The second assertion shows that the rate improves to 1/ ^/n times a loga- 
rithmic factor if log/o is the restriction of a function in A^'"^ {M.'^). The rate 
is better if r increases, but does not improve beyond r = 2, the exponent 
of the spectral density of the squared exponential process. For a Gaussian 
prior with a compactly supported spectral density, the rate would strictly 
improve as r increases, reaching the rate n~^/^(logn)'^"'"^ as r | oo. Other 
estimation schemes (see [4, 20, 22, 23] and [32]) can reach the better rate 
n-i/2(iogn)('='+i)/2. 

2.2. Fixed design regression. Suppose we observe independent variables 
Yi, . . . ,Yn satisfying the regression relation = W(){ti) + £i, for independent 
7V(0,cT§)-distributed error variables Si and known elements ti, . . . ,t„ of the 
unit cube [0,1]^^. The aim is to estimate the regression function wq. In this 
case a rescaled Gaussian process can be used directly as a prior for wo; cf. 
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[24, 49] and [40]. Posterior consistency for priors of this type was recently 
established in [9]. 

We use law of the random field {WAt - t & [0, 1]'^) as a prior for wq. If the 
standard deviation (Tq of the errors is unknown, we endow it with a prior 
distribution as well, which we assume to be supported on a given interval 
[a,b] C (0,oo) that contains ctq, with a Lebesgue density that is bounded 
away from zero. 

We denote the posterior distribution by n(-|yi, . . . , Yn). Let \\w\\n = (n~^ x 
Y^i=iw'^ {ti))^^'^ be the L2-norm corresponding to the empirical distribution 
of the design points. We say that the posterior contracts at rate En if, for 
every sufficiently large M, 

U{{w, a) : \\w - woWn + k - ctqI > Men\Yi, . . . ,y„) 0. 



Theorem 2.2. The assertions of Theorem 2.1 are true in the setting of 
regression for wo = fo. 

2.3. Classification. In the setting of classification, or binary regression, 
the use of rescaled Gaussian process priors was considered for instance in [7] 
and [40]. Consistency results were obtained in [14] and more recently in [8]. 

Consider i.i.d. observations {Xi,Yi), . . . , {Xn,Yn), where takes values 
in the unit cube [0, 1]"^ and Yi takes values in the set {0, 1}. The statistical 
problem is to estimate the binary regression function ro(t) = P{Yi = l\Xi = 
t). 

As a prior 11 on ro we use the law of the process {"^{WAt) - t ^ [0, l]"^), 
where ^ (0, 1) is the logistic or the normal distribution function. 

Let Il{-\{Xi,Yi), . . . ,{Xn,Yn)) denote the posterior and let || • ||l2(G) be 
the L2-norm relative to the marginal distribution G of Xi. We say that the 
posterior contracts at rate e„ if, for every sufficiently large M, 

n(r : ||r - ro||i,(G) > Me„|(Xi, Fi), . . . , (X„,y„)) ^ 0. 



Theorem 2.3. Let = ^' ^(^'o)- Then the assertions of Theorem 2.1 
are true. 

3. Rescaled Gaussian fields. Let W = {Wt : t G M'^) be a centered, homo- 
geneous Gaussian random field with covariance function of the form, for a 
given continuous function : M'^ — > R, 



(3.1) 



EWsWt = (l){s-t). 
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By Bochner's theorem there exists a finite Borel measure // on W^, the spec- 
tral measure of W, such that 

(3.2) cP{t)= J e~*(^'*V('^A). 

We shall consider processes whose spectral measure ^ has subexponential 
tails: for some 6 > 0, 

(3.3) y"e^ll^ll/i(t^A) < cx). 

The squared exponential process, whose covariance function is given in (1.1), 
falls in this class. Its spectral measure has density relative to the Lebesgue 
measure given by A i— > exp(— ||A|p/4)/(2'^7r'^/^). 

For a positive random variable A defined on the same probability space 
as W and stochastically independent of W let = (WAt'-t G [0, l]'') be 
the restriction to [0, 1]*^ of the rescaled process 1 1— > WAt- We consider it as 
a Borel measurable map in the space C[0, 1]*^, equipped with the uniform 
norm || • ||oo • The following theorem bounds the small-ball probability and the 
complexity of the support of the field . These are the essential ingredients 
for proving the statistical results in Section 2, and can also be used to analyse 
other Bayesian schemes. 

We assume that the distribution of A possesses a Lebesgue density g 
satisfying, for positive constants Ci, Di,C2, D2, nonnegative constants p,q, 
and every sufficiently large a > 0, 

(3.4) CiaP eM-Dia'^log'i a) < g{a) < C2aP exp{-D2a'^log'^ a). 

This is satisfied (with q = 0) if A'^ possesses a Gamma distribution. 

For given sequences e„ and and a given function wq : [0, l]'^ —>■ M, con- 
sider the following statement: there exist Borel measurable subsets i?„ of 
C[0, 1]"^ and a constant K such that, for every sufficiently large n, 

(3.5) P(||^^-u;o||oo<en)>e-"^", 

(3.6) P(T^^^B„)<e-^""", 

(3.7) logiV(e„,S„,|| • Hoc) <ne2. 



Theorem 3.1. Let W be a centered homogeneous Gaussian field with 
spectral measure n that satisfies (3.3) for some 5 > and that possesses a 
Lebesgue density f such that a 1— > /(aA) is decreasing on (0,oo) for every 
A E R'^: 



ADAPTIVE BAYESIAN FUNCTION ESTIMATION 



7 



• IfwQ^ C"[0, l]'^ for some a > 0, then there exist Borel measurable subsets 
Bn of C [0,1]'^ such that (3.5), (3.6) and (3.1) hold, for every sufficiently 
large n, and Sn = n~"/(^"+'^) (logn)'^i for and En = Ken{logn)'^^ , for ki = 
((1 + d) y q)/{2 + d/a) and k,2 = -\- d — q) /2 and a sufficiently large 
constant K . 

• If wq is the restriction of a function in ^''''''(M'^) to [0, 1]*^ and the spectral 
density satisfies |/(A)| > C3 exp(— AH*^) for some positive constants 
C3, and v, then there exist Borel measurable subsets Bn of C [0,1]'^ 
such that (3.5), (3.6) and (3.7) hold, for every sufficiently large n, and 
En = i^n-V2(logn)('^+i)/2 for r>u, En = Kn-V2(iogn){'^+i)/2+'^/{2r) for 
r <v, and En = en (log n)^'^^-'^)/^, for a sufficiently large constant K. 

In the paper [46] it is shown that (3.5)-(3.7) map one-to-one to the general 
conditions on rates of contraction of posterior distributions used in [17] and 
[16], for each of the three settings considered in Section 2. Thus a rate of 
contraction e„ V En is attained for each of these three settings. Theorems 
2.1-2.3 follow, with the parameter q equal to 0. (The use of two rates En 
and En requires a slight generalization of the main result in [17], formulated 
as Theorem 2.1 in [15]; also see the discussion following the statement of 
the main result in [16].) The choice q = d+ \ yields a slightly better rate (a 
lower power on the logarithmic factor), but we highlighted the choice g = 
in Section 2, as this corresponds to a Gamma prior. 

4. Auxiliary results. In this section we prepare a number of auxiliary 
lemmas needed in the proof of Theorem 3.1. In the proof of (3.5) we condition 
on the variable A, so that we can first consider the probability in (3.5) for A 
a fixed constant, and then combine the obtained bound with bounds on the 
tails of the distribution of A. The proofs of (3.6) and (3.7) involve similar 
steps. 

For fixed A the process is a Gaussian random field with values in 
C[0, 1]*^, and a key concept is the associated reproducing kernel Hilbert space 
(RKHS). This can be viewed as a subset of the space C[0, l]'^, which gives the 
"geometry" of the distribution of W"^, just as finite-dimensional Gaussian 
vectors are described by ellipsoids. According to general Gaussian process 
theory, obtaining good bounds for the probabilities in (3.5) and (3.6) for 
fixed A is closely linked to studying the metric entropy of the unit ball of 
the RKHS and the approximation of the function wq by elements of the 
RKHS. See [48] for a review. 

In Lemma 4.1 we start by characterizing the RKHS of the process W , from 
which the RKHS of the rescaled process will be obtained in Lemma 4.2. 
The RKHS of a Gaussian field (Wt'.t G T), with parameter set equal to 
a set T C M*^, is by definition the set of functions /i : T — > M that can be 
represented as h(t) = EWfL for L contained in the closure of the linear span 
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of the variables {Wt : t G T) in L2{0,,U, P), for {^},U, P) the probabihty space 
on which W is defined, equipped with the square norm = EL"^. 

Lemma 4.1. The RKHS of {Wt:t £ T) is the set of real parts of the 
functions (from T to C) 

J e*(^'*)'0(A)/x(dA), 

when ip runs through the complex Hilhert space L2(/u). The RKHS-norm of 
the displayed function equals the norm in L2{^) of the projection of tp on 
the closed linear span of the set of functions {cg : s £ T) ( or, equivalently, 
the infimum of \\tp\\2 over all functions ip giving the same function in the 
preceding display). IfTc has an interior point and (3.3) holds, then this 
closed linear span is L2{n) and the RKHS norm is ||^||L2{At)- 

Proof. The spectral representation (3.2) can be written as EWgWt = 
(ct, es)i2(^); function defined by et(A) = exp(i(A, t)). By definition 

the RKHS is therefore the set of functions as in the display, with ip running 
through the closure L^^ in L2{fJ.) of the linear span of the set of functions 
{cs : s £ T) , and the norm equal to the norm of tp in L2(/u). Here the "linear 
span" is taken over the reals. If instead we take the linear span over the 
complex numbers, we obtain complex functions whose real parts give the 
RKHS. 

The set of functions obtained by letting ip range the full space L2{pi) is 
precisely the same, as a general element ip € L2{^) gives exactly the same 
function as its projection H^ on Ly. However, the associated norm is the 
L2(/-f) norm of liip. This proves the first assertion of the lemma. For the 
second we must show that L-p = L2{fJ.) under the additional conditions. 

The partial derivative of order (ki, . . . , kd) with respect to (ti, . . . , t^) of 
the map t^ct at to is the function A i— > {iXi)^^ ■ ■ ■ {iXd)^''eto{X). Appealing 
to the dominated convergence theorem we see that this derivative exists as 
a derivative in L2{fi). Because to is an interior point of T by assumption, we 
conclude that the function A i— > (iA)^et(,(A) belongs to Lt for any multindex 
k of nonnegative integers. Consequently, the function pet^ belongs to for 
any polynomial p:W^ ^ C in d arguments. It suffices to show that these 
functions are dense in L2(m)- 

Equivalently, it suffices to prove that the polynomials themselves are dense 
in L2{fi). Indeed, ip £ L2{fi) is orthogonal to all functions of the form 
pctQ, then tpct^ is orthogonal to all polynomials. Denseness of the set of 
polynomials then gives that ipet^ vanishes /x-almost everywhere, whence ip 
vanishes /x-almost everywhere. 

That the polynomials are dense in L2{^) appears to be well known. A 
proof for d = 1 is given in [38] . For completeness we include a proof for gen- 
eral dimension d. Suppose that ip £ L2{p) is orthogonal to all polynomials. 
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Since /x is a finite measure, the complex conjugate '0 is /^-integrable, and 
hence we can define a complex measure v by 

v(B) = I J{X)KdX). 
JB 

It suffices to show that v is the zero measure, so that ^ = almost every- 
where relative to ^. 

By the Cauchy-Schwarz inequality and (3.3), with \\u\\ the (total) varia- 
tion measure of 

(4.1) y"e^ll^ll/2||z.||((iA) <oo. 

By a standard argument, based on the dominated convergence theorem (see, 
e.g., [3], Theorem 8.3.5), this implies that the function J e^'^'^^i/{dX) is 
analytic on the strip O = {z e : | Rezi] < 6/{2y/d), . . . , | Rez^l < 6/{2y/d)}. 
Also for z real and in this strip, by the dominated convergence theorem, 

n=0 n=0 

The right-hand side vanishes, because is orthogonal to all polynomials by 
assumption. 

We conclude that the function z^ j e^^'^^h'{dX) vanishes on the set {z G 
Q:lmz = 0}. Because this set contains a nontrivial interval in M for ev- 
ery coordinate, we can apply (repeated) analytic continuation to see that 
this function vanishes on the complete strip Q. In particular the Fourier 
transform J e^^'^'^^u{d\) of u vanishes on all of M.'^, whence u is the zero- 
measure. □ 

For W = {Wt : t G W^) a homogeneous Gaussian random field with spectral 
measure and a positive real number a, the rescaled process {Wat ■ t G IR"^) 
is also homogenous and has spectral measure /i^ that is related to /_f by 

fia{B)=fi{B/a). 

If fi has a (spectral) density /, then fia has density fa given by 

fa{X) = a-^f{X/a). 

We shall obtain approximation properties and small-ball probabilities for 
the process W"- = {Wat - t £[0, 1]''), viewed as a map in C[0, 1]'^. Let be 
the RKHS of W^, with corresponding norm || • Hh". It is described in Lemma 
4.1 with n taken equal to fia- 

The following lemma follows from general principles, or can be proved 
from the characterization of RKHSs given in Lemma 4.1. By "scaling map" 
/i 1— > (t I— > h{at)) we mean the map that attaches to a given function h : [0, - 
M the function g : [0, 1]^ M defined by g{t) = h{at). 
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Lemma 4.2. The scaling map (t ^ h{at)) is an isometry from the 
RKHS of the process {Wt'.te [0,a]'^]) onto W. 

The next step is to bound the concentration function of the Gaussian 
prior W"", again for a fixed a. The concentration function (at e > 0) is the 
sum of minus the log centered smaU probabihty, considered in Lemma 4.6, 
and the decentering function inf{||/i||ga : — Wo||oo < £}i which measures 
the positioning of the true parameter wq relative to the RKHS. We start 
by bounding the latter, separately for the cases that the true parameter is 
Holder or supersmooth in Lemmas 4.3 and 4.4. The first lemma is fairly 
standard, and proceeds by approximating wq by a suitable convolution of 
wq with a smooth function, which is contained in the RKHS. 

Lemma 4.3. Assume that the restriction of fi to some neighborhood of 
the origin is Lebesgue absolutely continuous with a density that is bounded 
away from zero. Let a> be given. Then for any w € C"[0, 1]*^ there exist 
constants C and D depending only on /i and w such that, as oo, 

mf{\\h\ga : ||/i - u;||oo < Ca^"} < Da'^. 

Proof. Let a be the biggest integer strictly smaller than a. Let G be 
a bounded neighborhood of the origin on which fj, has a Lebesgue density / 
that is bounded away from 0. Take a function ^ : M — > C with a symmetric, 
real-valued, infinitely smooth Fourier transform ip that is supported on an 
interval I such that I'^ C G and which equals l/(27r) in a neighborhood of 
zero, so that ^ has moments of all orders and 

|(it)V(t)rft = 2vr^('=)(0) = |J| HI] 

Define (/>:]R'^^C by (j){t) = ^{ti) ■ ■ ■il){td). Then we have that / (j){t)dt = 1, 
and ft^4){t)dt = Q, for any nonzero multi- index k = {ki , kd) of nonnega- 
tive integers. Moreover, we have that / ||t||"|(/>|(t) dt < oo, and the functions 
I (/>!// and I^P/Z are uniformly bounded. 

By Whitney's theorem we can extend w : [0, 1]"^ — > M to a function w : R'^ — > 
R with compact support and < oo. (See [50] or [41], Chapter VI; we 

can multiply an arbitrary smooth extension by an infinitely smooth function 
that vanishes outside a neighborhood of [0, l]'^ to ensure compact support). 

By Taylor's theorem we can write, for s,t G R'^, 

w{t + s)= 22 D^w{t)— + S{t,s), 

j : j. <a ^ ■ 

where 

\S{t,s)\ <C||s||" 
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for a positive constant C that depends on w but not on s and t. If we set 
(pait) = (t>{o-t) we get, in view of the fact that <^ is a higher-order kernel, for 
any t G W^, 



a'^{cpa*w){t)-w{t) 



{s){w{t — s/a) — wit)) ds 



{s)S{t, —s/a) ds. 



Combining the preceding displays shows that ||a''(^a * w — w\\oo < KCa 
for K = J\\s\\''\(l)\{s)ds. 

For w the Fourier transform of w, we can write 

Therefore, by Lemma 4.1 the function a'^cpa *w is contained in the RKHS 
H", with square norm a multiple of, with 11 the orthogonal projection in 
L2{n) onto the functions {et'.t £ [0, l]'^). 



, f \w{X)mX/a)(^ 



dX 



<a'^ / \w{X)\'^dX 



f 

Here {2Tr)'^ J \w\'^ (X) dX = J\w\'^{t)dt is finite, and is bounded by the 

construction of 6. □ 



The supersmooth case consists of the subcase that wq is "super-super 
smooth," that is, it belongs itself to the RKHS, and the more regular case 
in which it is approximated by its "projection" in the RKHS. 

Lemma 4.4. Assume that fi has a Lebesgue density f such that |/(A)| > 
C3exp(— Z?3||A||'^) for some positive constants C3, D3 and v. 

• If w is the restriction to [0, l]'^ of an element of A'^'''''{M.'^) for r>u, then 
w G for all sufficiently large a with uniformly hounded norm \\w\\fia. 

• If w is the restriction to [0, 1]"^ of an element of A'^'^iW^) for r <u, then 
there exist constants oq, C and D depending only on fi and w such that, 
for a > ao, 

mf{\\h\\la : \\h - w\\oo < Ce-^"7a-"+^} < Da'^. 

Proof. The Fourier transform of a function w G A'^''^{W^) is certainly 
integrable, and hence, by the inversion formula. 



w{t) 



<^''^w{X) dX 



Ja 
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In view of Lemma 4.1 w & if w//a G L2{^a)- Now 

J Ja J <-^3 

This is finite for every a > ii r > v. li r = u, then this is finite for a > 
{D^hf/". In both cases the right side is bounded as a — > co. 

To prove the second assertion let (p be as in the proof of Lemma 4.3, with 
compactly supported Fourier transform (j) constructed to be constant and 
equal to (27r)~'^ on [—1, 1]*^, and bounded in absolute value by this constant 
everywhere. By the argument given in this proof the function a'^(t)a * w \s 
contained in with square norm bounded above by a multiple of a'^, for 
sufficiently large a. Also 

2 



\a''<Pa*w{t)-w{t)Y 



i{X,t) 



{2Trf4>( - ) - 1 \w{X)dX 



< 



<4 



2\w{X)\dX 



||A/a||>l 



|t?;(A)pe^ll^ll''(iA, 



/||A||>a 

by the Cauchy-Schwarz inequality. The second factor is finite if ■u; G A'^'^{M.'^). 
The first is bounded by a multiple of e~'''^ a~^~^^, by a change of variable 
and Lemma 4.9. □ 



Next we turn to bounding the centered small-ball probability. According 
to general results on Gaussian processes (see [26]), this can be characterized 
in terms of the entropy of the unit ball of the RKHS. In view of Lemma 4.1 
this consists of certain analytic functions, and therefore we can bound its 
entropy by employing classical techniques as given in [25]. 

Let Mf be the unit ball in the RKHS of W'' = {W'':t£ [0, 1]"^), that is, 
the set of functions h G M'^ with H/iHh" ^ 1- 

Lemma 4.5. Let /i satisfy (3.3) for some 6 > 0. There exists a constant 
K, depending only on fi and d, such that, for e < 1/2, 

logAf(e,]H?,||-||oo)<Ka^flog-j . 

Proof. By Lemma 4.1 a typical element of can be written as the 
real part of the function : [0, 1]"^ — > C given by 



(4.2) 



h^{t) = j e'^^''HiX)fia{dX), 
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for -0 iM*^ — > C a function with / |^/;p/ia((iA) < 1. We shall construct an e-net 
over these functions consisting of piecewise polynomials. 

For R = 5/{'ia\fd) let be an i?/2-net in T = [0,1]'^, for the 

maximum norm, and let T = IJj -Si be a partition of T in sets Si, ... , Bm 
obtaining by assigning every t G T to a closest G {ti, . . . , tm}- Consider the 



piecewise polynomials P = 



-.iPi.ai^Bi 



for 



n. <k 



Here the sum ranges over all multi-index vectors n = (ni , . . . , n^) € (NU {0})" 
with n. = ni + ■ ■ ■ + Ud < k, and for s = (si, . . . , s^) 



G R the notation s" is 



short for Si^S2^ ■ ■ ■ s^- . We obtain a finite set of functions by discretizing the 
coefficients ai^n for each i and n over a grid of meshwidth e/i^^ -net in the 
interval [— C/i?"" , C/i?""], for given C > 0. The log cardinality of this set is 
bounded by 

logfn n #ai,n) <miog(^ n 



i n:n.<k 



n : n. <k 



2C 

< mk'^ log ( 



We can choose m < (3/i2/2)'^. The proof is complete once it is shown that 
the resulting set of functions is a i^e-net for constants C and K depending 
only on n, and for k of the order log(l/e). 

We can view the function /i^ as a function of the argument it, ranging over 
the product of the imaginary axes in C^. In view of (3.3) and the Cauchy- 
Schwarz inequality, this function can be extended to an analytic function 
z 1-^ / e^^'^^^l^{X) dfia{X) on the set {z G C^: ||Rez|| < 6/2}, which includes 
the strip O = {z G : | Rezi| < R, . . . ,\Rezd\ < R} for R = 5/{3aVd), and 
it satisfies the uniform bound, for every z G 0, 



By the Cauchy formula {d applications of the formula in one dimension 
suffice) , for Ci, . . . ,Cd circles of radius R in the complex plane around the 
coordinates tii,...,tici of ti, and with D" the partial derivative of orders 
n = {ni, . . . , rid) and n\ = ni\n2\ ■ ■ ■ Udl, 



{2-KiY Jc, {z - tiY 



h^{z) 



1+1 



dzi ■ ■ ■ dzd 



< 



C 



Consequently, for any z £ Bi, a universal constant K, and appropriately 
chosen a,- 



n. >A; 



n.>k l=k+l 



14 A. W. VAN DER VAART AND J. H. VAN ZANTEN 

-2\ k 



< KC 



n.<fc 1=1 



n.<k 



We conclude that the piecewise polynomials form a 2Ke-iiet for A; sufficiently 
large that (2/3)'^ is smaller than Ke. □ 

Lemma 4.6. // ^/le spectral measure satisfies (3.3), then for any oq > 
there exists constants C and Eq that depend only on uq, fi and d only such 
that, for a > Oq and e <£q, 

-logPf sup <e) <Ca'^ log- . 

\e[o,i]'* \ £/ 

Proof. This is essentially a corollary of Lemma 4.5 in the present paper 
and Theorem 2 of [26]. However, to make the dependence on the scaling 
factor a explicit it is necessary to go through the steps of the proof of the 
latter theorem. We only sketch the main steps of the long derivation. Let 
0Q(e) be the left side of the lemma. 

By formula (3.19) of [26], for any e, A > 0, 



6S(2e) +log$(A + $~i(e-'^S(^))) < logiv( ^ 



Choosing A = ^2(l)Q{e), using the fact that ^{y/2x + ^"^(e^^')) > 1/2 for 
every x > (see Lemma 4.10), and applying Lemma 4.5 to the right of the 
preceding display, we conclude that, for every e < 1/2, 

ro{2e)+log-<Ka^[log^^^'^ 

The (apparently) most difficult part of the proof is to show a crude bound 
of the form, for e < Eq and a > ao, and some r > 0, 

(4.3) <?^S(e)<a(^ 

Inserting this bound in the right of the preceding display gives that this is 
bounded by 

1 

Ka'^i (t+ l)log- + logC^ + Tloga 



e 

This implies the assertion of the lemma. 
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The bound (4.3) follows for fixed a immediately from Proposition 2.4 
of [36], whose condition is satisfied for any a > in our case, so that we 
can use any r > 0. To see the dependence on a we can follow the proof 
of Proposition 2.4, which unfortunately is involved. We only note that the 
constants in Lemma 2.1 of [36] (which is quoted from [39]) are universal and 
hence cause no problems; that Lemma 2.2 of [36] (which is quoted from [44]) 
can be formulated to say that sup;j,<„ k"ek{u*) < 32sup^<„ k°'ek{u) for every 
n, without conditions, and hence only involves the constant 32; finally, the 
proof of Proposition 2.2 is given in [36] and does not cause problems. □ 



For different values of a the processes W"" result from rescaling a single 
Gaussian field by different amounts. This leads to a nesting property of the 
attached RKHSs. 



Lemma 4.7. Assume (3.3). Ifa<b, then ^Mf C VbM.\. 



Proof. This follows from the characterization of the RKHS given in 
Lemma 4.1, together with the observations 

s*(^'*)V'(A)d/i,(A)= J e*(^'*)(^^^)(A)dM,(A), 



/ 



/ fa 
Jb 



Here we use that fa/fb{^) = {b/ a) f i^/ a) / f i^/b) < b/a by the assumed ra- 
dial monotonicity of the density / of the spectral measure fi. □ 



If a I the sample paths of tend on compacta to the constant value 
Wq. The following lemma gives a corresponding property for the RKHSs. 

Lemma 4.8. Any /i G Hf satisfies \h{0)\ < VlRI and \h{t) - h{0)\ < 
a\\t\\T for t"^ = J ||Apd//(A), for every t G T. 



Proof. By Lemma 4.1 a typical element of can be written as the 
real part of h{t) = J e^^^'^^ipW dfJ^aW for a function ■0 with J \'ip\'^ d^a < 1. 
It follows that |/i(0)| <J\ip\dna and \h{t) - h{0)\ < / |(A,t)||V'|(A)d/ia(A). 
Two applications of the Cauchy-Schwarz inequality conclude the proof. □ 



The final two lemmas in this section bound the tail probabilities of the 
scaling variable A, and give a bound on the normal quantile function, for 
easy reference. 
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Lemma 4.9. If the random variable A has a density g that satisfies (3.4) 
for some q>0, then for a"^ {log > 2\p — d + 1\ / {D2d) and a> e, 

F[A > a) < 



D2d{loga)<i 

Proof. Set jp,r{s) = s^'exp(-L'2s'^(logs)'^)(logs)^ and Jp,r{a) = 
jp,r{s) ds. The derivative of the function jp^ can, with the help of the 
chain rule, be expressed as the sum of three terms. By integrating this iden- 
tity we see that 



Jp,o(a) = D2dJpj^d~i,q{o) + DqJp^q^i{a) - pJp^i^l 



a . 



The middle term on the right is nonnegative (the third is negative if and 
only if p > 0). By the transformation p + d — 1 ^ p we conclude that 

D2dJp^q{a) - \p-d+l\Jp^dfl{a) < ip-d+i,o(o). 

Here Jp^q{a) > {log a)'^Jpfi{a) and Jp~dfl{o) < o-^'^ Jpfi{o)- By substituting 
these inequalities in the left-hand side and rearranging we obtain the bound 
on P{A > a) < C2 Jp.o(a) asserted by the lemma. □ 

Lemma 4.10. The standard normal distribution function ^ satisfies <&(x) < 
exp(-a;^/2) forx < and -y' 2 log{l/u) < ^~^{u) forue (0, 1) and ^~^{u) < 
-yiog{l/u) foru£{0,l/A). 

5. Proof of Theorem 3.1. For a given a > define centered and decen- 
tered concentration functions of the process = {Wat :i S [0, l]"') by 

ro{e) = -logP{\\W^oo<e), 
Co(e) = ,^ inf „ ^ ||/i||^.-logP(||T^'^||oc<e). 

Then P(||VF''||oo < e) = exp(-(/>g(e)) by definition, and by results of [27] (cf. 
Lemma 5.3 of [48]), 



(5.1) P{\\W''-wo\\oc<2£)>e 



By Lemma 4.6 we have that 0o(^) — Cia'^{\og{a/ e)Y'^'^ for a> oq and e < £q, 
where the constants aO)^0)C'4 depend only on /i and w. 

For Bi the unit ball of C[0, 1]*^ and given positive constants M,r,6,e set 

B = I mJ^MI + eMi\u(\J (A/H-f) + eBi V 
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By Lemma 4.7 the set B contains the set MMf + eBi for any a G [5,r]. 
This is true also for a < 6, trivially, by the definition of B. Consequently, by 
Borell's inequality (see [5] or Theorem 5.1 in [48]), for any a<r, 

PiW iB)< P{W°- i Ami + eEi) < 1 - $($-^(e"'^o(^)) + M) 

<!_$($-! (e-'/'5(^)) + M), 

because e~'^o(^) = P[supt^aT \ '^t\ < e) is decreasing in a. For 

M> -2$-i(e~'^S(^)), 

the right-hand side is bounded by 1 — <I>(M/2) < e^^^^/^. The latter condition 
is certainly satisfied if (cf. Lemma 4.10), 



M > 4.^(t)l{e) and e~'^o(e) < 1/4. 

Here e~'^o('^^ < e~'^o('^) for r > 1 and is certainly smaller than 1/4 if e is 
smaller than some fixed ei. Therefore, in view of Lemma 4.6 the inequalities 
are satisfied if 

(5.2) > 16C74r'^(log(r/e))^+'^, r > 1, e<eiAeo- 

In view of Lemma 4.9, for r larger than a positive constant depending on d 
and the density of A only, 

P{W^ iB)< P{A >r)+ r P(VF° ^ B)g{a) da 

(5.3) " ^" ^ 

- Dsdlog^r 

This inequality is true for any B = i?A/,r,5,£ with M, r, 5, e satisfying (5.2). 
By Lemma 4.5, for M\Jr jb > 2e and r > ao, 

logiV f 2e, mJ^MI + eBi, || • lU J < log ( e, AfW^H^, || • |U J 



< ivTr"' ( los 



By Lemma 4.8 every element of MEIf for a < 5 is within uniform distance 
S^fdrM of a constant function for a constant in the interval [—E,E\, for 
E = MyiRl • It follows that, for e > dVdrM, 



IE 

iV(3e, |J(AfEI?) + eBi,||-|U) <A^(e, [-^, | • |) < — . 

a<5 



e 



The covering number of a union is bounded by the sum of the covering 
numbers. Therefore, with the choice 5 = e/(2\/dTM), together the last two 
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displays yield, since log(x + y) < log(2(x V y)) logx + 2 logy for x > 1, y > 2, 
for 2E/e > 2, 

yM3/2V2^dV4 
\ ^2 



logA^(3e,B,||-||oo)<i^r'^(loj 

(5.4) 



+ 21og^^^ 



e 

This inequality is valid for any B = BM,r,S,e with d = e / {2^/dTM), and any 
M, r, e with 



(5.5) M^/^ ^27^^1/4 ^ 2e3/2, r > ao, mJ\\ii\\>£. 



In the remainder of the proof we make special choices for these parameters, 
depending on the assumption on wq. 

5.1. Holder smoothness. Suppose that wq G C"[0, 1]*^ for some a > 0. In 
view of Lemmas 4.3 and 4.6, for every ao there exist positive constants 
eo < 1/2, C, D and K that depend on w and only such that, for a > ao, 

£ < £o and £ > Ca~", 

Co (e) < Da'' + C^a'' {log ^) '^"^ < A'la^ {\og 

for Ki depending on ao, and d only. Therefore, for e < eo A Coq" [so that 
(C/e)V">ao],by (5.1), 

P(||W"^ - u;o||oo < 2e) > H e"'^™o(=)^(a) da 

JO 

> / e-^^'^''°*?''''('^/^)5(a)da 



in view of (3.4), for a constant i('2 that depends only on Ki,C,Di,d,a,q. 
We conclude that PdlM^"^ — wo||oo < £n) > exp(— ne^) for e„ a large multiple 
of n-i/(2+rf/a)(iQg„)7^ for 7 = ((1 + d) V g)/(2 + d/a), and sufficiently large 
n. 

By (5.2)-(5.3) P{W^ ^ B) is bounded above by a multiple of exp(-Cone^) 
for an arbitrarily large constant Co if (5.2) holds and 

D2r\\ogrY >2Con£l, 

(5.6) ^p-d+l < gCon4^ 
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Given Cq we first choose r = r„ as the minimal solution to the first equation, 
and next we choose M = Mn to satisfy the third equation and (5.2). The 
second equation is then automatically satisfied, for large n. 

With these choices of M and r and En bounded below by a power of n the 
right-hand side of (5.4) is bounded by a multiple of r5^(logn)^"'"'^ + log?i. This 
is bounded by ne^ for a large multiple of (r^/n) (log n)^^'^. Inequalities 
(5.5) are clearly satisfied. 

5.2. Infinite smoothness, r>v. Suppose that wq is the restriction of a 
function wq G A^'^{W^) for r>v, and that the spectral density is bounded 
below by a multiple of exp(— AH*^) for some positive constants and 
v. By combining the first part of Lemma 4.4 and Lemma 4.6, we see that 
there exist positive constants ao < ^i; ^Oi Ki and C4 that depend on w and 
^ only such that, for a G [ao, fli] and e < eo; 

Co(^)<^i + C4a'(log^y^'. 

Consequently, by (5.1), 

P(||VF^-wo||oo<2e)> / e-^''^'o^^>g{a)da 

Jo 

We conclude that P(|| Vl^"^ — wo||oo < En) > exp(— ne^) for e„ a large multiple 
of n~^/^(logn)('^~^"'^)/^, and sufficiently large n. 

Next we choose B of the form as before, with r and M solving (5.6) and 
satisfying (5.2), that is, and large multiples of (logn)"^"*"^. Then (5.2)- 
(5.3) show that P{W^ ^ B) is bounded above by a multiple of exp(— Cone^), 
and the right-hand side of (5.4) is bounded by a multiple of r^(log(l/e) + 
loglogn)^+'^-|-log(l/e) + loglogn. For e = e„ a large multiple of n~^/^(logn)'^~ 
this is bounded above by ne^. 

5.3. Infinite smoothness, r <v. Consider the situation as in the preced- 
ing section, but now with r <v. Combining the second part of Lemma 4.4 
and Lemma 4.6, we see that there exist positive constants ao, eoi C*, D, Ki 
and C4 that depend on w and ^ only and 7' > 7 such that, for a > oq, e < 
and Cexp(— 7'a'') < e, 

Co {e) <Da'' + C + 4a^ {log 
Consequently, by (5.1), for constants Di,D2 that depend on w and only, 

POO 

P{\\W^ -wo\\oo<2e)> / e~'^-o(^)^(a)cia 

J(log{C/e)/7')i/'- 
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We conclude that PdlVF"^ — "WqUcxd < £n) > exp(— ne^) for e„ a large multiple 
of n~^/^(logn)'^/(^'')"'~('^+^)/^, and sufficiently large n. 

Next we choose B of the form as before, with r and M solving (5.6), 
that is, and large multiples of (logn)'^/''+'^+^ Then (5.2) and (5.3) 
show that P{W'^ ^ B) is bounded above by a multiple of exp(— Cone^), 
and the right-hand side of (5.4) is bounded by a multiple of r^(log(l/e) + 
log log n) ^"'"'^ + log(l/e) + loglogn. For e = e„ a large multiple of 
n~^/^ (log n)'^^-'^ '^^'"^ this is bounded above by ne^. 
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